Neuro-Dynamic Programming: An Overview

نویسنده

  • Dimitri P. Bertsekas
چکیده

Neuro-dynamic programming (NDP for short) is a relatively new class of dynamic programming methods for control and sequential decision making under uncertainty. These methods have the potential of dealing with problems that for a long time were thought to be intractable due to either a large state space or the lack of an accurate model. They combine ideas from the fields of neural networks, artificial intelligence, cognitive science, simulation, and approximation theory. We will delineate the major conceptual issues, survey a number of recent developments, describe some computational experience, and address a number of open questions. We consider systems where decisions are made in stages. The outcome of each decision is not fully predictable but can be anticipated to some extent before the next decision is made. Each decision results in some immediate cost but also affects the context in which future decisions are to be made and therefore affects the cost incurred in future stages. Dynamic programming (DP for short) provides a mathematical formalization of the tradeoff between immediate and future costs. Generally, in DP formulations there is a discrete-time dynamic system whose state evolves according to given transition probabilities that depend on a decision/control u. In particular, if we are in state i and we choose decision u, we move to state j with given probability pij(u). Simultaneously with this transition, we incur a cost g(i, u, j). In comparing, however, the available decisions u, it is not enough to look at the magnitude of the cost g(i, u, j); we must also take into account how desirable the next state j is. We thus need a way to rank or rate states j. This is done by using the optimal cost (over all remaining stages) starting from state j, which is denoted by J∗(j). These costs can be shown to

منابع مشابه

Neuro - Dynamic Programming : Overview and Recent

Neuro-dynamic programming is comprised of algorithms for solving large-scale stochastic control problems. Many ideas underlying these algorithms originated in the eld of arti cial intelligence and were motivated to some extent by descriptive models of animal behavior. This chapter provides an overview of the history and state-of-the-art in neuro-dynamic programming, as well as a review of recen...

متن کامل

Dual Heuristic Programming for Fuzzy Control

Overview material for the Special Session (Tuning Fuzzy Controllers Using Adaptive Critic Based Approximate Dynamic Programming) is provided. The Dual Heuristic Programming (DHP) method of Approximate Dynamic Programming is described and used to the design a fuzzy control system. DHP and related techniques have been developed in the neurocontrol context but can be equally productive when used w...

متن کامل

A dynamic programming approach for solving nonlinear knapsack problems

Nonlinear Knapsack Problems (NKP) are the alternative formulation for the multiple-choice knapsack problems. A powerful approach for solving NKP is dynamic programming which may obtain the global op-timal solution even in the case of discrete solution space for these problems. Despite the power of this solu-tion approach, it computationally performs very slowly when the solution space of the pr...

متن کامل

Feature Selection for Neuro-Dynamic Programming∗

Neuro-Dynamic Programming encompasses techniques from both reinforcement learning and approximate dynamic programming. Feature selection refers to the choice of basis that defines the function class that is required in the application of these techniques. This chapter reviews two popular approaches to neuro-dynamic programming, TDlearning and Q-learning. The main goal of the chapter is to demon...

متن کامل

A neuro-dynamic programming approach to admission control in ATM networks: the single link case

We are interested in solving large-scale Markov Decision Problems. The classical method of Dynamic Programming provides a mathematical framework for nding optimal solutions for a given Markov Decision Problem. However, for Dynamic Programming algorithms become computationally infeasible when the underlying Markov Decision Problem evolves over a large state space. In recent years, a new methodol...

متن کامل

Reducing policy degradation in neuro-dynamic programming

We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to reflect whether it is b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995